Extraction and Quantification of Pack-years and Classification of Smoker Information in Semi-structured Medical Records
نویسندگان
چکیده
Electronic medical records contain a wealth of information that is potentially invaluable to many interested parties. However, the fact that most of these documents are of semistructured nature and are comprised of fragmented English free text, region-specific templates and clinical sublanguage among many other things, has made it difficult to use existing Natural Language Processing tools on them directly and to extract those information. In this work, we focus our attention on a set of medical records pertaining to Rheumatoid Arthritis patients and we present a pattern-based methodology for extracting and quantifying pack-year information. We also introduce an extension to those patterns in classifying individual instances within these documents into a set of predefined smoker status classes. Since our effort in extracting pack-years from medical documents is the first in its kind to the best of our knowledge, we evaluate our approach on a manually selected document collection and present very promising results. We also evaluate our instance classification approach using an additional document collection. Appearing in Proceedings of the 28 th International Conference on Machine Learning, Bellevue, WA, USA, 2011. Copyright 2011 by the author(s)/owner(s).
منابع مشابه
Automatic Extraction of Semantic Content from Medical Discharge Records
Semi-structured medical texts like discharge summaries are rich sources of information that can exploit the research results of physicians by performing statistical analysis of similar cases. In this paper we introduce a system based on Machine Learning algorithms that successfully classifies discharge records according to the smoking status of the patient (we distinguish between current smoker...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملComparison of Three Information Sources for Smoking Information in Electronic Health Records
OBJECTIVE The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength i...
متن کاملSmoking Pattern in Family Members of Smokers in Slums of Surat City, Western India
Background: The relationship between becoming a smoker and having smoker parents, siblings, and relatives is still uncovered in India. The influences of a smoking role model in a family on smoking habits of individuals are yet to be revealed. This study aimed to understand the relationship of smoking abuse of a person with smoking of their family members. Methods: This community-based cross-sec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011